Abstract MDP Reward Shaping for Multi-Agent Reinforcement Learning

نویسندگان

  • Kyriakos Efthymiadis
  • Sam Devlin
  • Daniel Kudenko
چکیده

MDP Reward Shaping for Multi-Agent Reinforcement Learning Kyriakos Efthymiadis, Sam Devlin and Daniel Kudenko Department of Computer Science, The University of York, UK Abstract. Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be given to agents, an important problem that arises is how to generate a useful potential function. Previous research demonstrated the use of plan-based reward shaping in multi-agent reinforcement learning (MARL), where STRIPS planning was used to generate a potential function. The results showed that potential functions based on joint plans can improve an agent’s performance. When using individual plans however, the agents face conflicting goals which can have a detrimental effect in performance. In this paper we present the use of abstract MDPs as a method to provide heuristic knowledge in MARL and how it can be utilised for conflict resolution when agent communication and goal-shaping is not possible. Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be given to agents, an important problem that arises is how to generate a useful potential function. Previous research demonstrated the use of plan-based reward shaping in multi-agent reinforcement learning (MARL), where STRIPS planning was used to generate a potential function. The results showed that potential functions based on joint plans can improve an agent’s performance. When using individual plans however, the agents face conflicting goals which can have a detrimental effect in performance. In this paper we present the use of abstract MDPs as a method to provide heuristic knowledge in MARL and how it can be utilised for conflict resolution when agent communication and goal-shaping is not possible.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparison of plan-based and abstract MDP reward shaping

Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting away from tabula-rasa approaches many different reward shaping methods have been developed. In this paper we compare two different methods for reward shaping; plan-based, in which an agent is provided with a plan and extra rewards are given according to the steps of ...

متن کامل

Combining manual feedback with subsequent MDP reward signals for reinforcement learning

As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on tamer showed that...

متن کامل

Effects of Shaping a Reward on Multiagent Reinforcement Learning

In reinforcement learning problems, agents take sequential actions with the goal of maximizing a time-delayed reward. In this chapter, the design of reward shaping for a continuing task in a multiagent domain is investigated. We use an interesting example, keepaway soccer (Kuhlmann, 2003; Stone, 2002; Stone, 2006), in which a team tries to maintain ball possession by avoiding the opponent’s int...

متن کامل

Plan-based reward shaping for multi-agent reinforcement learning

Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learnin...

متن کامل

Multi-agent, reward shaping for RoboCup KeepAway

This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory [2], potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of statebased and state-action-based reward shaping in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013